{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "WfGpInivS0fG"
   },
   "source": [
    "<h2 align=\"center\">点击下列图标在线运行HanLP</h2>\n",
    "<div align=\"center\">\n",
    "\t<a href=\"https://colab.research.google.com/github/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/con_stl.ipynb\" target=\"_blank\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\t<a href=\"https://mybinder.org/v2/gh/hankcs/HanLP/doc-zh?filepath=plugins%2Fhanlp_demo%2Fhanlp_demo%2Fzh%2Fcon_stl.ipynb\" target=\"_blank\"><img src=\"https://mybinder.org/badge_logo.svg\" alt=\"Open In Binder\"/></a>\n",
    "</div>\n",
    "\n",
    "## 安装"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "IYwV-UkNNzFp"
   },
   "source": [
    "无论是Windows、Linux还是macOS，HanLP的安装只需一句话搞定："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1Uf_u7ddMhUt"
   },
   "outputs": [],
   "source": [
    "!pip install hanlp -U"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pp-1KqEOOJ4t"
   },
   "source": [
    "## 加载模型\n",
    "HanLP的工作流程是先加载模型，模型的标示符存储在`hanlp.pretrained`这个包中，按照NLP任务归类。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "0tmKBu7sNAXX"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'CTB9_CON_ELECTRA_SMALL': 'https://file.hankcs.com/hanlp/constituency/ctb9_con_electra_small_20220215_230116.zip',\n",
       " 'CTB9_CON_FULL_TAG_ELECTRA_SMALL': 'https://file.hankcs.com/hanlp/constituency/ctb9_full_tag_con_electra_small_20220118_103119.zip'}"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import hanlp\n",
    "hanlp.pretrained.constituency.ALL # 语种见名称最后一个字段或相应语料库"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "EmZDmLn9aGxG"
   },
   "source": [
    "调用`hanlp.load`进行加载，模型会自动下载到本地缓存。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "con = hanlp.load('CTB9_CON_FULL_TAG_ELECTRA_SMALL')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "elA_UyssOut_"
   },
   "source": [
    "## 短语句法分析\n",
    "输入为已分词的一个或多个句子："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 70
    },
    "id": "BqEmDMGGOtk3",
    "outputId": "2a0d392f-b99a-4a18-fc7f-754e2abe2e34"
   },
   "outputs": [],
   "source": [
    "trees = con([[\"2021年\", \"HanLPv2.1\", \"为\", \"生产\", \"环境\", \"带来\", \"次\", \"世代\", \"最\", \"先进\", \"的\", \"多\", \"语种\", \"NLP\", \"技术\", \"。\"], [\"阿婆主\", \"来到\", \"北京\", \"立方庭\", \"参观\", \"自然\", \"语义\", \"科技\", \"公司\", \"。\"]], tasks='con')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "返回值为一个`Tree`的数组:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[['TOP', [['IP', [['NP-TMP', [['_', ['2021年']]]], ['NP-PN-SBJ', [['_', ['HanLPv2.1']]]], ['VP', [['PP-BNF', [['_', ['为']], ['NP', [['_', ['生产']], ['_', ['环境']]]]]], ['VP', [['_', ['带来']], ['NP-OBJ', [['CP', [['CP', [['IP', [['VP', [['NP', [['DP', [['_', ['次']]]], ['NP', [['_', ['世代']]]]]], ['ADVP', [['_', ['最']]]], ['VP', [['_', ['先进']]]]]]]], ['_', ['的']]]]]], ['NP', [['QP', [['_', ['多']]]], ['NP', [['_', ['语种']]]]]], ['NP', [['_', ['NLP']], ['_', ['技术']]]]]]]]]], ['_', ['。']]]]]], ['TOP', [['IP', [['NP-SBJ', [['_', ['阿婆主']]]], ['VP', [['VP', [['_', ['来到']], ['NP-OBJ', [['_', ['北京']], ['NP-PN', [['_', ['立方庭']]]]]]]], ['VP', [['_', ['参观']], ['NP-OBJ', [['_', ['自然']], ['_', ['语义']], ['_', ['科技']], ['_', ['公司']]]]]]]], ['_', ['。']]]]]]]\n"
     ]
    }
   ],
   "source": [
    "print(trees)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "转换为bracketed格式："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(TOP\n",
      "  (IP\n",
      "    (NP-TMP (_ 2021年))\n",
      "    (NP-PN-SBJ (_ HanLPv2.1))\n",
      "    (VP\n",
      "      (PP-BNF (_ 为) (NP (_ 生产) (_ 环境)))\n",
      "      (VP\n",
      "        (_ 带来)\n",
      "        (NP-OBJ\n",
      "          (CP\n",
      "            (CP\n",
      "              (IP\n",
      "                (VP\n",
      "                  (NP (DP (_ 次)) (NP (_ 世代)))\n",
      "                  (ADVP (_ 最))\n",
      "                  (VP (_ 先进))))\n",
      "              (_ 的)))\n",
      "          (NP (QP (_ 多)) (NP (_ 语种)))\n",
      "          (NP (_ NLP) (_ 技术)))))\n",
      "    (_ 。)))\n"
     ]
    }
   ],
   "source": [
    "print(trees[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 组装流水线"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "短语成分树的第一层non-terminal一般是词性标签，所以经常与词性标注一起使用。为此，先加载一个词性标注器："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "pos = hanlp.load(hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "然后创建一个函数将词性标签和句法树组装起来:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hanlp_common.document import Document\n",
    "def merge_pos_into_con(doc:Document):\n",
    "    flat = isinstance(doc['pos'][0], str)\n",
    "    if flat:\n",
    "        doc = Document((k, [v]) for k, v in doc.items())\n",
    "    for tree, tags in zip(doc['con'], doc['pos']):\n",
    "        offset = 0\n",
    "        for subtree in tree.subtrees(lambda t: t.height() == 2):\n",
    "            tag = subtree.label()\n",
    "            if tag == '_':\n",
    "                subtree.set_label(tags[offset])\n",
    "            offset += 1\n",
    "    if flat:\n",
    "        doc = doc.squeeze()\n",
    "    return doc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "之后就可以用一个流水线将三者组装起来了："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "nlp = hanlp.pipeline() \\\n",
    "    .append(pos, input_key='tok', output_key='pos') \\\n",
    "    .append(con, input_key='tok', output_key='con') \\\n",
    "    .append(merge_pos_into_con, input_key='*')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "该流水线的结构如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[tok->TransformerTagger->pos, tok->CRFConstituencyParser->con, None->merge_pos_into_con->None]\n"
     ]
    }
   ],
   "source": [
    "print(nlp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "传入一个已分词的句子试试："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"tok\": [\n",
      "    \"2021年\",\n",
      "    \"HanLPv2.1\",\n",
      "    \"带来\",\n",
      "    \"最\",\n",
      "    \"先进\",\n",
      "    \"的\",\n",
      "    \"多\",\n",
      "    \"语种\",\n",
      "    \"NLP\",\n",
      "    \"技术\",\n",
      "    \"。\"\n",
      "  ],\n",
      "  \"pos\": [\n",
      "    \"NT\",\n",
      "    \"NR\",\n",
      "    \"VV\",\n",
      "    \"AD\",\n",
      "    \"VA\",\n",
      "    \"DEC\",\n",
      "    \"CD\",\n",
      "    \"NN\",\n",
      "    \"NR\",\n",
      "    \"NN\",\n",
      "    \"PU\"\n",
      "  ],\n",
      "  \"con\": [\n",
      "    \"TOP\",\n",
      "    [[\"IP\", [[\"NP-TMP\", [[\"NT\", [\"2021年\"]]]], [\"NP-PN-SBJ\", [[\"NR\", [\"HanLPv2.1\"]]]], [\"VP\", [[\"VV\", [\"带来\"]], [\"NP-OBJ\", [[\"CP\", [[\"CP\", [[\"IP\", [[\"VP\", [[\"ADVP\", [[\"AD\", [\"最\"]]]], [\"VP\", [[\"VA\", [\"先进\"]]]]]]]], [\"DEC\", [\"的\"]]]]]], [\"NP\", [[\"QP\", [[\"CD\", [\"多\"]]]], [\"NP\", [[\"NN\", [\"语种\"]]]]]], [\"NP\", [[\"NR\", [\"NLP\"]], [\"NN\", [\"技术\"]]]]]]]], [\"PU\", [\"。\"]]]]]\n",
      "  ]\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "doc = nlp(tok=[\"2021年\", \"HanLPv2.1\", \"带来\", \"最\", \"先进\", \"的\", \"多\", \"语种\", \"NLP\", \"技术\", \"。\"])\n",
    "print(doc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "流水线的输出也是一个Document，所以支持可视化："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"display: table; line-height: 128%;\"><pre style=\"display: table-cell; font-family: SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace; white-space: nowrap;\">Token&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>─────────&nbsp;<br>2021年&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>HanLPv2.1&nbsp;<br>带来&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>最&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>先进&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>的&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>多&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>语种&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>NLP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>技术&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br>。&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</pre><pre style=\"display: table-cell; font-family: SFMono-Regular,Menlo,Monaco,Consolas,Liberation Mono,Courier New,monospace; white-space: nowrap;\">PoS&nbsp;&nbsp;&nbsp;&nbsp;3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;7&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;9&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;10<br>────────────────────────────────────────────────────────────────────────<br>NT&nbsp;─────────────────────────────────────────────────────►NP-TMP&nbsp;────┐&nbsp;&nbsp;&nbsp;<br>NR&nbsp;─────────────────────────────────────────────────────►NP-PN-SBJ──┤&nbsp;&nbsp;&nbsp;<br>VV&nbsp;────────────────────────────────────────────────────┐&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;<br>AD&nbsp;───►ADVP──┐&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;<br>VA&nbsp;───►VP&nbsp;───┴►VP&nbsp;────►IP&nbsp;───┐&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;<br>DEC──────────────────────────┴►CP&nbsp;────►CP&nbsp;───┐&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;├►VP─────────┼►IP<br>CD&nbsp;───►QP&nbsp;───┐&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;<br>NN&nbsp;───►NP&nbsp;───┴────────────────────────►NP────┼►NP-OBJ──┘&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;<br>NR&nbsp;──┐&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;<br>NN&nbsp;──┴────────────────────────────────►NP&nbsp;───┘&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;<br>PU&nbsp;─────────────────────────────────────────────────────────────────┘&nbsp;&nbsp;&nbsp;</pre></div>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "doc.pretty_print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "如果要分析原始文本的话，分词是第一步，所以先加载一个分词器："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "tok = hanlp.load(hanlp.pretrained.tok.COARSE_ELECTRA_SMALL_ZH)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "然后将分词器插入到流水线的第一级："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[None->TransformerTaggingTokenizer->tok,\n",
       " tok->TransformerTagger->pos,\n",
       " tok->CRFConstituencyParser->con,\n",
       " None->merge_pos_into_con->None]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp.insert(0, tok, output_key='tok')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "然后就可以直接分析原始文本了："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(TOP\n",
      "  (IP\n",
      "    (NT 2021)\n",
      "    (M 年)\n",
      "    (NP-PN-SBJ (NR HanLPv2.1))\n",
      "    (VP\n",
      "      (VV 带来)\n",
      "      (NP-OBJ\n",
      "        (CP (CP (IP (VP (ADVP (AD 最)) (VP (VA 先进)))) (DEC 的)))\n",
      "        (NP (QP (CD 多)) (NP (NN 语种)))\n",
      "        (NP (NR NLP) (NN 技术))))\n",
      "    (PU 。)))\n"
     ]
    }
   ],
   "source": [
    "print(nlp('2021年HanLPv2.1带来最先进的多语种NLP技术。')['con'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "你明白吗？HanLP是为聪明人设计的，只要你足够聪明，你就可以优雅地实现各种功能。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 操作短语树的技巧"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "短语结构树的类型为`phrasetree.tree.Tree`，提供了许多接口，此处列举其中一些常用的接口。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(TOP\n",
      "  (IP\n",
      "    (NP-TMP (NT 2021年))\n",
      "    (NP-PN-SBJ (NR HanLPv2.1))\n",
      "    (VP\n",
      "      (VV 带来)\n",
      "      (NP-OBJ\n",
      "        (CP (CP (IP (VP (ADVP (AD 最)) (VP (VA 先进)))) (DEC 的)))\n",
      "        (NP (QP (CD 多)) (NP (NN 语种)))\n",
      "        (NP (NR NLP) (NN 技术))))\n",
      "    (PU 。)))\n"
     ]
    }
   ],
   "source": [
    "tree = doc['con'] # tree数组的话则需要doc['con'][0]\n",
    "print(tree)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 按高度枚举子树"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "子树：(VP (ADVP (AD 最)) (VP (VA 先进)))\t标签：VP\t短语：['最', '先进']\n",
      "子树：(NP (QP (CD 多)) (NP (NN 语种)))\t标签：NP\t短语：['多', '语种']\n"
     ]
    }
   ],
   "source": [
    "for subtree in tree.subtrees(lambda t: t.height() == 4):\n",
    "    print(f'子树：{subtree}\\t标签：{subtree.label()}\\t短语：{subtree.leaves()}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 按标签枚举子树"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(NP (QP (CD 多)) (NP (NN 语种)))\n",
      "(NP (NN 语种))\n",
      "(NP (NR NLP) (NN 技术))\n"
     ]
    }
   ],
   "source": [
    "for subtree in tree.subtrees(lambda t: t.label() == 'NP'):\n",
    "    print(subtree)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 遍历子节点"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "父节点(NP (NR NLP) (NN 技术))的子节点有：\n",
      "(NR NLP)\n",
      "(NN 技术)\n"
     ]
    }
   ],
   "source": [
    "print(f'父节点{subtree}的子节点有：')\n",
    "for child in subtree:\n",
    "    print(child)"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "collapsed_sections": [],
   "name": "con_stl.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}